701 research outputs found
Geodesic Distance Histogram Feature for Video Segmentation
This paper proposes a geodesic-distance-based feature that encodes global
information for improved video segmentation algorithms. The feature is a joint
histogram of intensity and geodesic distances, where the geodesic distances are
computed as the shortest paths between superpixels via their boundaries. We
also incorporate adaptive voting weights and spatial pyramid configurations to
include spatial information into the geodesic histogram feature and show that
this further improves results. The feature is generic and can be used as part
of various algorithms. In experiments, we test the geodesic histogram feature
by incorporating it into two existing video segmentation frameworks. This leads
to significantly better performance in 3D video segmentation benchmarks on two
datasets
Structure, dynamics and bifurcations of discrete solitons in trapped ion crystals
We study discrete solitons (kinks) accessible in state-of-the-art trapped ion
experiments, considering zigzag crystals and quasi-3D configurations, both
theoretically and experimentally. We first extend the theoretical understanding
of different phenomena predicted and recently experimentally observed in the
structure and dynamics of these topological excitations. Employing tools from
topological degree theory, we analyze bifurcations of crystal configurations in
dependence on the trapping parameters, and investigate the formation of kink
configurations and the transformations of kinks between different structures.
This allows us to accurately define and calculate the effective potential
experienced by solitons within the Wigner crystal, and study how this
(so-called Peierls-Nabarro) potential gets modified to a nonperiodic globally
trapping potential in certain parameter regimes. The kinks' rest mass (energy)
and spectrum of modes are computed and the dynamics of linear and nonlinear
kink oscillations are analyzed. We also present novel, experimentally observed,
configurations of kinks incorporating a large-mass defect realized by an
embedded molecular ion, and of pairs of interacting kinks stable for long
times, offering the perspective for exploring and exploiting complex collective
nonlinear excitations, controllable on the quantum level.Comment: 25 pages, 10 figures, v2 corrects Fig. 2 and adds some text and
reference
Learning to Extract Motion from Videos in Convolutional Neural Networks
This paper shows how to extract dense optical flow from videos with a
convolutional neural network (CNN). The proposed model constitutes a potential
building block for deeper architectures to allow using motion without resorting
to an external algorithm, \eg for recognition in videos. We derive our network
architecture from signal processing principles to provide desired invariances
to image contrast, phase and texture. We constrain weights within the network
to enforce strict rotation invariance and substantially reduce the number of
parameters to learn. We demonstrate end-to-end training on only 8 sequences of
the Middlebury dataset, orders of magnitude less than competing CNN-based
motion estimation methods, and obtain comparable performance to classical
methods on the Middlebury benchmark. Importantly, our method outputs a
distributed representation of motion that allows representing multiple,
transparent motions, and dynamic textures. Our contributions on network design
and rotation invariance offer insights nonspecific to motion estimation
Point-wise mutual information-based video segmentation with high temporal consistency
In this paper, we tackle the problem of temporally consistent boundary
detection and hierarchical segmentation in videos. While finding the best
high-level reasoning of region assignments in videos is the focus of much
recent research, temporal consistency in boundary detection has so far only
rarely been tackled. We argue that temporally consistent boundaries are a key
component to temporally consistent region assignment. The proposed method is
based on the point-wise mutual information (PMI) of spatio-temporal voxels.
Temporal consistency is established by an evaluation of PMI-based point
affinities in the spectral domain over space and time. Thus, the proposed
method is independent of any optical flow computation or previously learned
motion models. The proposed low-level video segmentation method outperforms the
learning-based state of the art in terms of standard region metrics
Video Object Detection with an Aligned Spatial-Temporal Memory
We introduce Spatial-Temporal Memory Networks for video object detection. At
its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent
computation unit to model long-term temporal appearance and motion dynamics.
The STMM's design enables full integration of pretrained backbone CNN weights,
which we find to be critical for accurate detection. Furthermore, in order to
tackle object motion in videos, we propose a novel MatchTrans module to align
the spatial-temporal memory from frame to frame. Our method produces
state-of-the-art results on the benchmark ImageNet VID dataset, and our
ablative studies clearly demonstrate the contribution of our different design
choices. We release our code and models at
http://fanyix.cs.ucdavis.edu/project/stmn/project.html
Dense Motion Estimation for Smoke
Motion estimation for highly dynamic phenomena such as smoke is an open
challenge for Computer Vision. Traditional dense motion estimation algorithms
have difficulties with non-rigid and large motions, both of which are
frequently observed in smoke motion. We propose an algorithm for dense motion
estimation of smoke. Our algorithm is robust, fast, and has better performance
over different types of smoke compared to other dense motion estimation
algorithms, including state of the art and neural network approaches. The key
to our contribution is to use skeletal flow, without explicit point matching,
to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this
paper we describe our algorithm in greater detail, and provide experimental
evidence to support our claims.Comment: ACCV201
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
In this work, we study rapid, step-wise improvements of the loss in
transformers when being confronted with multi-step decision tasks. We found
that transformers struggle to learn the intermediate tasks, whereas CNNs have
no such issue on the tasks we studied. When transformers learn the intermediate
task, they do this rapidly and unexpectedly after both training and validation
loss saturated for hundreds of epochs. We call these rapid improvements
Eureka-moments, since the transformer appears to suddenly learn a previously
incomprehensible task. Similar leaps in performance have become known as
Grokking. In contrast to Grokking, for Eureka-moments, both the validation and
the training loss saturate before rapidly improving. We trace the problem back
to the Softmax function in the self-attention block of transformers and show
ways to alleviate the problem. These fixes improve training speed. The improved
models reach 95% of the baseline model in just 20% of training steps while
having a much higher likelihood to learn the intermediate task, lead to higher
final accuracy and are more robust to hyper-parameters
A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects
Recently, Minimum Cost Multicut Formulations have been proposed and proven to be successful in both motion trajectory segmentation and multi-target tracking scenarios. Both tasks benefit from decomposing a graphical model into an optimal number of connected components based on attractive and repulsive pairwise terms. The two tasks are formulated on different levels of granularity and, accordingly, leverage mostly local information for motion segmentation and mostly high-level information for multi-target tracking. In this paper we argue that point trajectories and their local relationships can contribute to the high-level task of multi-target tracking and also argue that high-level cues from object detection and tracking are helpful to solve motion segmentation. We propose a joint graphical model for point trajectories and object detections whose Multicuts are solutions to motion segmentation {\it and} multi-target tracking problems at once. Results on the FBMS59 motion segmentation benchmark as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark demonstrate the promise of this joint approach
TraMNet - Transition Matrix Network for Efficient Action Tube Proposals
Current state-of-the-art methods solve spatiotemporal action localisation by
extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate
sets of temporally connected bounding boxes called \textit{action micro-tubes}.
However, they fail to consider that the underlying anchor proposal hypotheses
should also move (transition) from frame to frame, as the actor or the camera
does. Assuming we evaluate 2D anchors in each frame, then the number of
possible transitions from each 2D anchor to the next, for a sequence of
consecutive frames, is in the order of , expensive even for small
values of . To avoid this problem, we introduce a Transition-Matrix-based
Network (TraMNet) which relies on computing transition probabilities between
anchor proposals while maximising their overlap with ground truth bounding
boxes across frames, and enforcing sparsity via a transition threshold. As the
resulting transition matrix is sparse and stochastic, this reduces the proposal
hypothesis search space from to the cardinality of the thresholded
matrix. At training time, transitions are specific to cell locations of the
feature maps, so that a sparse (efficient) transition matrix is used to train
the network. At test time, a denser transition matrix can be obtained either by
decreasing the threshold or by adding to it all the relative transitions
originating from any cell location, allowing the network to handle transitions
in the test data that might not have been present in the training data, and
making detection translation-invariant. Finally, we show that our network can
handle sparse annotations such as those available in the DALY dataset. We
report extensive experiments on the DALY, UCF101-24 and Transformed-UCF101-24
datasets to support our claims.Comment: 15 page
A Multi-scale Bilateral Structure Tensor Based Corner Detector
9th Asian Conference on Computer Vision, ACCV 2009, Xi'an, 23-27 September 2009In this paper, a novel multi-scale nonlinear structure tensor based corner detection algorithm is proposed to improve effectively the classical Harris corner detector. By considering both the spatial and gradient distances of neighboring pixels, a nonlinear bilateral structure tensor is constructed to examine the image local pattern. It can be seen that the linear structure tensor used in the original Harris corner detector is a special case of the proposed bilateral one by considering only the spatial distance. Moreover, a multi-scale filtering scheme is developed to tell the trivial structures from true corners based on their different characteristics in multiple scales. The comparison between the proposed approach and four representative and state-of-the-art corner detectors shows that our method has much better performance in terms of both detection rate and localization accuracy.Department of ComputingRefereed conference pape
- âŚ